Robust Automatic Speech Recognition Features using Complex Wavelet Packet Transform Coefficients

نویسندگان

  • Tjong Wan Sen
  • Bambang Riyanto Trilaksono
  • Arry Akhmad Arman
چکیده

To improve the performance of phoneme based Automatic Speech Recognition (ASR) in noisy environment; we developed a new technique that could add robustness to clean phonemes features. These robust features are obtained from Complex Wavelet Packet Transform (CWPT) coefficients. Since the CWPT coefficients represent all different frequency bands of the input signal, decomposing the input signal into complete CWPT tree would also cover all frequencies involved in recognition process. For time overlapping signals with different frequency contents, e. g. phoneme signal with noises, its CWPT coefficients are the combination of CWPT coefficients of phoneme signal and CWPT coefficients of noises. The CWPT coefficients of phonemes signal would be changed according to frequency components contained in noises. Since the numbers of phonemes in every language are relatively small (limited) and already well known, one could easily derive principal component vectors from clean training dataset using Principal Component Analysis (PCA). These principal component vectors could be used then to add robustness and minimize noises effects in testing phase. Simulation results, using Alpha Numeric 4 (AN4) from Carnegie Mellon University and NOISEX-92 examples from Rice University, showed that this new technique could be used as features extractor that improves the robustness of phoneme based ASR systems in various adverse noisy conditions and still preserves the performance in clean environments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Speech Recognition Using Wavelet Coefficient Features

We propose a new vein of feature vectors for robust speech recognition that use denoised wavelet coefficients. Greater robustness to unexpected additive noise or spectrum distortions begins with more robust acoustic features. The use of wavelet coefficients is motivated by human acoustic process modelling and by the ability of wavelet coefficients to capture important time and frequency feature...

متن کامل

Recognition of stress in speech using wavelet analysis and Teager energy operator

The automatic recognition and classification of speech under stress has applications in behavioural and mental health sciences, human to machine communication and robotics. The majority of recent studies are based on a linear model of the speech signal. In this study, the nonlinear Teager Energy Operator (TEO) analysis was used to derive the classification features. Moreover, the TEO analysis w...

متن کامل

A New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)

Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...

متن کامل

New Filter Structure based on Admissible Wavelet Packet Transform for Text-Independent Speaker Identification

Identical acoustic features like Mel frequency cepstral Coefficients (MFCC)and Linear predictive cepstral coefficients (LPCC) are being widely used for different tasks like speech recognition and speaker recognition, whereas the requirement of speaker recognition is different than that of speech recognition. In MFCC feature representation, the Mel frequency scale is used to get a high resolutio...

متن کامل

A Comparison of Visual Features for Audio-Visual Automatic Speech Recognition

The use of visual information from speaker’s mouth region has shown to improve the performance of Automatic Speech Recognition (ASR) systems. This is particularly useful in presence of noise, which even in moderate form severely degrades the speech recognition performance of systems using only audio information. Various sets of features extracted from speaker’s mouth region have been used to im...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010